Resilire: Achieving High Availability at the Virtual Machine Level
نویسندگان
چکیده
(ABSTRACT) High availability is a critical feature of data centers, cloud, and cluster computing environments. Replication is a classical approach to increase service availability by providing redundancy. However , traditional replication methods are increasingly unattractive for deployment due to several limitations such as application-level non-transparency, non-isolation of applications (causing security vulnerabilities), complex system management, and high cost. Virtualization overcomes these limitations through another layer of abstraction, and provides high availability through live virtual machine (VM) migration: a guest VM image running on a primary host is transparently check-pointed and migrated, usually at a high frequency, to a backup host, without pausing the VM; the VM is resumed from the latest checkpoint on the backup when a failure occurs. A virtual cluster (VC) generalizes the VM concept for distributed applications and systems: a VC is a set of multiple VMs deployed on different physical machines connected by a virtual network. In this dissertation, we first reduce live VM migration downtimes through a memory synchronization technique, called FGBI. FGBI reduces the dirty memory updates which must be migrated during each migration epoch by tracking memory at block granularity. Additionally, it determines memory blocks with identical content and shares them to reduce the increased memory overheads due to block-level tracking granularity, and uses a hybrid compression mechanism among dirty blocks to reduce the migration traffic. We implement FGBI in the Xen VM and compare it with two state-of-the-art VM migration solutions including LLM and Remus, on benchmarks including the Apache webserver and the SPEC benchmark suite. Our experimental results reveal that FGBI reduces the downtime by as much as ≈77% and ≈45% over LLM and Remus respectively, with a performance overhead of ≈13%. We then present a lightweight, globally consistent checkpointing mechanism for VC, which checkpoints the VC for immediate restoration after (one or more) VM failures. VPC predicts the checkpoint-caused page faults during each checkpointing interval, in order to implement a lightweight checkpointing approach for the entire VC. Additionally, it uses a globally consistent checkpoint-ing algorithm, which preserves the global consistency of the VMs' execution and communication states, and only saves the updated memory pages during each checkpointing interval. Our experimental results reveal that VPC reduces the solo VM downtime by as much as ≈45% and reduces the entire VC downtime by as much as ≈50% over competitors including VNsnap, with a memory overhead of ≈9% and performance overhead of ≈16%. The …
منابع مشابه
Semi-shared storage subsystem for OpenNebula
To address the limitations of OpenNebula storage subsystems, we have designed and developed an extension that is capable of achieving higher I/O throughput than the prior subsystems. The semi-shared storage subsystem uses central and distributed resources at the same time. Virtual machine instances with high availability requirements can run directly from central storage while other virtual mac...
متن کاملCommunication-Aware Traffic Stream Optimization for Virtual Machine Placement in Cloud Datacenters with VL2 Topology
By pervasiveness of cloud computing, a colossal amount of applications from gigantic organizations increasingly tend to rely on cloud services. These demands caused a great number of applications in form of couple of virtual machines (VMs) requests to be executed on data centers’ servers. Some of applications are as big as not possible to be processed upon a single VM. Also, there exists severa...
متن کاملRetroVisor: Nested Virtualization for Multi IaaS VM Availability
Nested virtualization [1] provides an extra layer of virtualization to enhance security with fairly reasonable performance impact. Usercentric vision of cloud computing gives a high-level of control on the whole infrastructure [2], such as untrusted dom0 [3, 4]. This paper introduces RetroVisor, a security architecture to seamlessly run a virtual machine (VM) on multiple hypervisors simultaneou...
متن کاملRAM analysis of earth pressure balance tunnel boring machines: A case study
Earth pressure balance tunnel boring machines (EPB-TBMs) are favorably applied in urban tunneling projects. Despite their numerous advantages, considerable delays and high maintenance cost are the main disadvantages these machines suffer from. Reliability, availability, and maintainability (RAM) analysis is a practical technique that uses failure and repair dataset obtained over a reasonable ti...
متن کاملLightweight Live Migration for High Availability
High availability is a critical feature for service clusters and cloud computing, and is often considered more valuable than performance. One commonly used technique to enhance the availability is live migration, which replicates services based on virtualization technology. However, continuous live migration with checkpointing will introduce significant overhead. In this paper, we present a lig...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012